Mining linguistic tone patterns with symbolic representation

نویسنده

  • Shuo Zhang
چکیده

This paper conceptualizes speech prosody data mining and its potential application in data-driven phonology/phonetics research. We first conceptualize Speech Prosody Mining (SPM) in a time-series data mining framework. Specifically, we propose using efficient symbolic representations for speech prosody time-series similarity computation. We experiment with both symbolic and numeric representations and distance measures in a series of time-series classification and clustering experiments on a dataset of Mandarin tones. Evaluation results show that symbolic representation performs comparably with other representations at a reduced cost, which enables us to efficiently mine large speech prosody corpora while opening up to possibilities of using a wide range of algorithms that require discrete valued data. We discuss the potential of SPM using time-series mining techniques in future works.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Similarity search optimization using recently-biased symbolic representation

Dimension reduction is one of the important requirements for a successful representation to improve the efficiency of extracting the attracting trend patterns on the time series. Furthermore, an efficient and accurate similarity searching on a huge time series data set is a crucial problem in data mining preprocessing. Symbolic representations have proven to be a very effective way to reduce th...

متن کامل

Clustering Large Symbolic Datasets

Clustering is the process of partitioning a set of labeled/unlabeled patterns into meaningful groups so that patterns in each group/cluster are similar to each other in some sense and patterns in different clusters are dissimilar in a corresponding sense. A major outcome of clustering process is an abstraction in the form of description of the clusters; this abstraction can be useful in several...

متن کامل

A text representation language for contextual and distributional processing

This thesis examines distributional and contextual aspects of linguistic processing in relation to traditional symbolic approaches. Distributional processing is more commonly associated with statistical methods, while an integrated representation of context spanning document and syntactic structure is lacking in current linguistic representations. This thesis addresses both issues through a nov...

متن کامل

Foundations of Data Mining and knowledge Discovery

This paper discusses a view to capture discovery as a translation from non-symbolic to symbolic representation. First, a relation between symbolic processing and non-symbolic processing is discussed. An intermediate form was introduced to represent both of them in the same framework and clarify the difference of these two. Characteristic of symbolic representation is to eliminate quantitative m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016